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Abstract. We propose in this work an original estimator of the conditional intensity 
of a marker-dependent counting process, that is, a counting process with covariates. 
We use model selection methods and provide a non asymptotic bound for the risk of 
our estimator on a compact set. We show that our estimator reaches automatically a 
convergence rate over a functional class with a given (unknown) anisotropic regularity. 
Then, we prove a lower bound which establishes that this rate is optimal. Lastly, we 
provide a short illustration of the way the estimator works in the context of conditional 
hazard estimation. 
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1. Introduction 

As counting processes can model a great diversity of observations, especially in medicine, 
actuarial science or economics, their statistical inference has received a continuous atten- 



tion since half a century - see Andersen et al. ( 1993 1 for the most detailed presentation on 



the subject. In this paper, we propose a new strategy, based on model selection, for the 
inference for counting processes in presence of covariates. The model considered can be 
described as follows. 

Let (O, J'-", P) be a probability space and {J^t)t>o a filtration satisfying the usual condi- 
tions. Let N he a marker- dependent counting process, with compensator A with respect 
to {J^t)t>Q, such that — A = M, where M is a (^j)t>o-martingale. We assume that A^ is 
a marker-dependent counting process satisfying the Aalen multiplicative intensity model 
in the sense that : 

(1) A(t) = [ a{X,z)Y{z)dz, hi allt>0 

Jo 

where X is a vector of covariates in M'^ which is JTo-measurable, the process Y is nonneg- 
ative and predictable and a is an unknown deterministic function called intensity. 

The purpose of this paper is to estimate the intensity function a on the basis of the 
observation of a n-sample {Xi, N'^(z),Y'^(z), z < t) for i = 1, . . . ,n, where r < -|-oo. 



MAPS, University Paris Descartes, France, email: fabienne.comte@parisdescartes.fr, 
LSTA University Pierre et Marie Curie, France, email: stephane.gaiffas@upmc.fr, 
LSTA University Pierre et Marie Curie, France, email: agathe.guilloux@upmc.fr. 



2 



F. COMTE, S. GAIFFAS & A. GUILLOUX 



There are many examples, crucial in practice, which fulfill this model. For the seek of 
conciseness, we restrict our presentation to the three following ones. 

Example 1 (Regression model for right-censored data). Let T be a nonnegative random 
variable (r.v.) and X a vector of covariates in M'^, with respective cumulative distribution 
functions (c.d.f.) Ft and Fx- We consider in addition that T can be censored. We 
introduce the nonnegative r.v. C, with c.d.f. G, such that the observable r.v. are Z = 
T AC, 6 = 1{T <C) and X. We assume that: 

(C) : T and C are independent conditionally to X. 



In this case, the processes to consider (see e.g. Andersen et al. (19931) are given, for 
i = 1, . . . ,n and z >0, by: 

N\z) = l{Zi <z,Si = 1) and Y\z) = l{Zi > z). 

The unknown intensity function a to be estimated is the conditional hazard rate of the 
r.v. T given X = x defined, for all z > by: 

, . , . fT\x{x,z) 

a{x, z) = aT\x{x, z) = t r, 

where ^-iid Fx\x are respectively the conditional probability density function (p.d.f.) 
and the conditional c.d.f. of Y given X. 

Nonparametric estimation of the hazard rate in presence of covariates was initiated 



by Beran (1981 1 . Stute (19861, Dabrowska (19871, McKeague and Utikal (1990) and 



Li and Doss (19951 extended his results. Many authors have considered semiparametric 



Andersen et al. 


( 


19931 for 


We refer to 


Huang 


( 


1999) 



and Linton et al. (2003) for some recent developments. 



As far as we know, adaptive nonparametric estimation for censored data in presence of 



covariates has only been considered in Brunei et al. (2007), who constructed an optimal 



adaptive estimator of the conditional density. 



Example 2 (Cox processes). Let r/*, for i = 1, . . . , n, be a Cox process (see Kaar (1986)) 
on M+ with random mean-measure A* given by : 

K\t) = [ a{Xi,z)dz, 







where Xi is a vector of covariates in 



In this context the predictable process Y of 



Equation ([ij) constantly equals 1. As a consequence, these processes can be seen as gen- 
eralizations of nonhomogeneous Poisson processes on M+ with random intensities. This 



is a particular case of longitudinal data, see e.g. Example VII. 2. 15 in Andersen et al 



(1993). The nonparametric estimation of the intensity of Poisson processes without co- 



variates has been considered in several papers. We refer to Reynaud-Bouret (2003) and 



Baraud and Birge ( 2006 ) for the adaptive estimation of the intensity of nonhomogeneous 



Poisson processes in general spaces. 

Example 3 (Regression model for transition intensities of Markov processes). Consider 
a n-sample of nonhomogeneous time-continuous Markov processes P^, . . . , with finite 
state space {1, . . . , A;} and denote by Oji the transition intensity from state j to state /. For 
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individual i with covariate Xi, let Njj^(t) be the number of observed direct transitions from 
j to / before time t (we allow the possibility of right- censoring for example). Conditionally 
on the initial state, the counting process Nji verifies the following Aalen multiplicative 
intensity model: 



ajiiXi, z)YHz)dz + M\t) for all t > 0, 



where YUt) = l{P'{t-) = j} for ah t > 0, see Andersen et al. (1993) or 



This setting is discussed in Andersen et al. (19931, see Example Vil.ll on morta. 



Jacobsen 



(19821. 



ity and 



nephropathy for insulin dependent diabetics. 



We finally cite three papers, where different strategies for the estimation of the intensity 
of counting processes is considered, gathering as a consequence all the previous examples, 
but in none of them the presence of covariates was considered. Ramlau-Hansen ( 1983 ) 



proposed a kernel-type estimator, Gregoire ( 1993 ) studied cross-validation for these esti- 
mators. More recently, Reynaud-Bouret (2006) considered adaptive estimation by model 
selection. 

Our aim in this work is to provide an optimal adaptive nonparametric estimator of the 
conditional intensity. Our estimation procedure involves the minimization of a so-called 
contrast. To achieve that purpose, we proceed as follows. In Section [2] we describe the 
estimation procedure: we explain how the contrast is built, on which collections of spaces 
the estimators are defined and how the relevant space is selected via a data driven penalized 
criterion. In Section |3] we state an oracle inequality for our estimator (see Theorem [T]), 
a resulting upper bound (see Corollary [T]) and a lower bound (see Theorem |2| , the latter 
asserts the optimality in the minimax sense. An auxiliary estimation of the density of 
the reference measure is also studied. The examples of Section |4] are taken in the setting 
of Example [T] in order to provide a short illustration of the practical properties of our 
estimator. Lastly, proofs are gathered in Sections 5 6][7 We mention that the deviation 
inequalities proved in Section [6] may be of intrinsic interest. 

Remark 1. An inherent remark about this model is that there is no reason for the condi- 
tional intensity a{x, z) to have the same behavior with respect to the z (time) and x (covari- 
ates) variables. This is the reason why it is mandatory in our purely nonparametric setting 
to consider anisotropic regularity for a . Thin k for instance of the very popular case of pro- 

(19721, it is assumed that a(x, z) = ao{z) exp(/3~''j;) 

Of course, in this model, the 



Cox 



portional hazards Cox model, see 

for some unknown function ao and unknown vector fi £ 
smoothness in the x direction is higher than in the z direction. 



For the sake of simplicity, we will assume in the following that the covariate X is 
one-dimensional. Similar procedures and results for multivariate covariates are an almost 
effortless extension, as discussed in Remark [3| 



2. Description of the procedure 



Our estimation procedure involves the minimization of a contrast. This contrast is 
tuned to the problem considered in this paper, as explained in the next section. 
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2.1. Definition of the contrast. Let A = Aix A2he a compact set on M x M4. on which 
the function a will be estimated. Without loss of generality, we set A = [0, 1] x [0, 1], and 
in particular r = 1. Let /i be a function in (L^ n L'^){A). Define the contrast function: 



(2) 



ln{h) 



n 1 

-E / 



h^{X,,z)Y\z)dz 



n 



n „i 

V / KX,,z)dN' 
i=i -^0 



This contrast is of least-squares type adapted to the problem considered here. Since each 
admits a Doob-Meyer decomposition (A^' = A* + M*), we have: 



n 1 

i=l 



h^{Xi,z)Y\z)dz 



n 



so that 





Let Fx denote the c.d.f. of the covariate X and 

-1 



n „i n „i 

V/ h{Xi,z)dA\z) - h{X„z)dM' 
,=1 Jo n .^^ Jq 



E(7„(/i)) = E( / h^{X,z)Y{z)dz) -¥.{2 I h{X, z)dA{z)). 



Ml 



^^(X, z)Y{z)dz) 









1^ the norm defined by: 
z)dfi{x, z), 

A 



where diJ,{x,z) := E{Y(z)\X = x)Fx{dx)dz. By the Aalen multiplicative intensity model, 
see Equation Q, we get: 



h(x, z)a{x, z)¥,{Y{z)\X = x)Fx{dx)dz = \\h — a 



a 



This explains why minimizing 7n(') over an appropriate set of functions described below, 
is a relevant strategy to estimate a. 

Example 1 continued. In the particular case of regression for right-censored data, the 
conditional hazard function is estimated and the contrast function has the following form: 



ln{h) 



n I 

-E / 



h^iXi, z)l(Zi > z)dz - - V 5ihiXi, Z 



i=l 



We have in addition an explicit formula for dfj,{x, z): 

dii{x, z) = (l- Lz\x{z, x))Fx{dx)dz, 

where 

1 - Lz\x{z, x) := F{Z > z\X = x) = {1 - F^^xi^, z)){l - Gc\x{x, z)) 
and Gq^x is the conditional c.d.f. of C given X. 

Remark 2. In our setting, it is possible to let the censoring depend on the covariates, as 
m |Dabrowska| ( p89l ) or, more recently [Heuchenne and Van Keilegom] ( [2006^ . Assumption 
(C) above is weaker than the assumption: T and C are independent and F{T < C\X, Y) = 
P(r < C\Y) in [Stute] ([19961). 
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2.2. Assumptions and notations. Before defining the estimation procedure, we need 
to introduce some assumptions and notations. Define the norms 

h^{x^z)dxdz^\\h\\\:= ii h^^x, z)dxdz a.nd \\h\\oo, a '■= sup \h{x,z)\^ 
JJa {x,z)eA 

and assume that the following holds: 

• (Al) The covariates Xi admit a p.d.f. fx such that sup^^^ < +00. 

Assumption (^1) implies that n admits a density w.r.t. the Lebesgue measure. We denote 
by / this density: 

(3) dfi{x,z) = f{x, z)dxdz where f{x, z) = ¥,{Y{z)\X = x)fx{x). 

We also assume: 

• (^2) There exists /o > 0, such that ^{x, z) £ Ai x A2, f{x, z) > /q. 

• (^3) V(x,z) G X A2, a(x, z) < ||a||oo,A < +00. 

• (A4) yi,\/t, Y^(t) < Cy where Cy is a known fixed constant. 

Note that in the examples described in Section [T] Assumption (^4) is clearly fulfilled with 
Cy = 1. We will set Cy = 1 in the following for simplicity. 

2.3. Definition of the estimator. We use the usual model selection paradigm (see, 
for instance, Massart ( [2007) ): first minimize the contrast 7n(') over a finite-dimensional 



function space S'm, then select the appropriate space by penalization. We introduce a 
collection {Sm^rn G Mn} of projection spaces: Sm is called a model and Ain is a set of 



multi-indexes (see the examples in Section 2.4). For each m = (7711,772-2), the space Sm of 



functions with support in A = Ai x A2 is defined by: 

Sm = Fm, Hm, = [K h{x, z) = Yl ^^)^k . G ^j, 



TT 



where Fm^ and llm2 a-rs subspaces of (L^n-L°°)(M) respectively spanned by two orthonor- 
mal bases (¥?™)jeJ^ with \ Jm\ = -Dmi and {i''^)k(iKrr, with \Km\ = -Dma- For ah j and 
all A;, the supports of and t/^™ are respectively included in Ai and A2. Here j and k 
are not necessarily integers, they can be couples of integers, as in the case of a piecewise 



polynomial space, see Section 2.4 



Remark 3. From a theoretical point of view, we could consider that the covariates X are 
in M'^ and even that their density has an anisotropic regularity. For this end, we would 
have to consider models of the form Sm = Fm^ Hm2 03 • • • Hm^^^- However, this would 
make the proofs more intricate. Notice also the convergence rate would be slower because 
of the curse of dimensionality. For the sake of clarity, we deliberately restrict ourselves to 
X G M. 

The first step would be to define am = argmin^^g^^ 7„(/i). To that end, let h[x,y) = 
"l^jeJm ^k&Km ^j>V'j"(^)V'Ar(y) ^ function in Sm- To compute am-, we have to solve: 



VjoVfco, 1^ = ^ GmAm = T. 
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where Am denotes the matrix {aj^k)j(^j^,k(^K^, 



Gr, 

and 



J / {j,k),{i,p)eJmxKm 



' m 



1 " 



1=1 



Unfortunately Gm may not be invertible. To overcome this problem, we modify the defi- 
nition of dim in the following way: 



(4) «m := { Q 



argmin/,gs^ 7„(/i) on Tr 



on 



where 

fm ■■= |minSp(Gm) > max(/o/3, n"^/^)| 

where Sp(Gm) denotes the spectrum of Gm i-e. the set of the eigenvalues of the matrix 
Gm (it is easy to see that they are nonnegative) . The estimator /o of /o (the minimum of 
the density /, see (.4.2)) is required to fulfill the following assumption: 

. (.45) For any integer A: > 1, P(|/o - /o| > /o/2) < Cfc/n^ 
An estimator satisfying (.45) is defined in Section |3.4[ In fact, = 7 is enough for the 
proofs. We refer the reader to the proof of Lemma [Tj see Section [7] for an explanation of 
the presence of n^/^ in the definition of r^- In practice, this constraint is generally not 
used (the matrix is invertible, otherwise another model is considered). 
The final step is to select the relevant space via the penalized criterion: 



(5) m = argmin ( 7n(am) + pen(m^ 

meM„ ^ 

where pen(m) is defined in Theorem [T] below, see Section |3j Our estimator of a on ^ is 
then Om- 

2.4. Assumptions on the models and examples. Let us introduce the following set of 
assumptions on the models {Sm '■ rn £ Mn}, which are usual in model selection techniques. 



n. 



• (Ml) For i = 1,2, Vli> := max„eA^„ Dm, < nV4/Vlog 

• {M2) There exist positive reals (/>i,</>2 such that, for all u in Fmi and for all v in 
Hm2, we have 

sup |'u(x)p < (piDmx / u'^ and sup |i'(a;)p < 02-0^2 / 
x&At Jai xeA2 J A2 

By letting = \/4>i4>2^ that leads to 

(6) \/h£Sm ||/l||oo,A < (poy^ Dm^Dm^WhWA- 

• (7W3) Nesting condition: 

Dmi < Dm[ =^ Fm^ C F^'^ and Dm2 < Dm>^ =^ Hm2 C Hm'^ ■ 

Moreover, there exists a global nesting space 5„ in the collection, such that Vm G 
Mn, Sm C Sn and dim(5„) := iV„, < y^n/logn. 
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Assumptions (A^1)-(A^3) are not too restrictive. Indeed, they are verified for the spaces 
(and 11^2) on Ai = [0, 1] spanned by the fohowing bases (see Barron et ah (1999|): 



[r] Trigonometric basis: span(93o, • • • , V^mi-i) with (po = 1([0, 1]), ^2j{x) = \/2 
cos(27rjx) 1([0, l])(x), (/72i-i(x) = \/2sin(27rjx)l([0,l])(a;) for j > 1. For this 
model Dmi = mi and (pi = 2 hold. 

[DP] Regular piecewise polynomial basis: polynomials of degree 0, . . . , r (where 
r is fixed) on each interval [{I — l)/2^, //2^[ with I = 1, ... ,2^ . In this case, we 
have mi = {D , r), = {j = (/, d), 1 < / < 2^, < d < r}, D„,^ = (r + 1)2^ and 
4)1 = y/r + l. 

\W] Regular wavelet basis: span(^ifc, / = —1, . . . , mi, k G A(/)) where ^'_i,fc is the 
translates of the father wavelet and ^';fc(x) = 2'/^*I'(2'x — k) where ^' is the 
mother wavelet. We assume that the supports of the wavelets are included in Ai 



and that belongs to the Sobolev space VFJ) see Hardle et al. (1998). 
• [H] Histogram basis: for Ai = [0,1], span((/9i, . . . , (/92™i ) with yjj = 2™'i/^l([(j — 
l)/2'"i, j/2"^i[) for j = 1, . . . ,2™i. Here Am = 2'"i, 0i = 1. Notice that [H] is a 
particular case of both [DP] and \W] . 

Remark 4. The first assumption prevents the dimension to be too large compared to 
the number of observations. We can lighten considerably this constraint for localized 
basis: for histogram basis, piecewise polynomial basis and wavelets, (A41) reduces to 

T^n^ < y^n/logn. Analogously in (A13), we would get A^^ < n/logn. The condition (A42) 
implies a useful link between the norm and the infinite norm. The third assumption 
(A^3) implies in particular that Vm, m' G Mn-, Sm + Sm' C Sn- This condition is useful 
for the chaining argument used in the proofs, see Section |6j 

3. Main results 
3.1. Oracle inequality. For a function h and a space 5, let 

d{h,S) = mf^Wh- g\\ = m(^JJ \h{x,y) - g{x,y)\^dxdyj ^ . 

The estimator where am is given respectively by (Q and m is given by ^ satisfies 
the following oracle inequality. 

Theorem 1. Let {Al) - {A5) and (A^l) - (A^3) hold. Define the following penalty: 

(7) pen(m) := A:o(1 + ||a||oo,A) '^'"i^'"^ ^ 
where Kq is a numerical constant. We have 

C 

(8) ¥.{\\a\{A) - am\?) < C inf {d2(al(^), 5„) + pen(m)} + — 

where C = C{fo, ||/|U,cxd) and C is a constant depending on (/>i,(/>2, ||a||oo,A5 /o- 
The proof of Theorem [T] involves a deviation inequality for the empirical process 



Un{h) : 

n 



- V C h{X,,z)dM\z), 
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where M*(t) = N^{t) — a{Xi, z)Y^{z)dz are martingales, see Section [ij and a — L°° 
chaining argument. 

Remark 5. The penalty involves the unknown quantity ||q;||oo,a- This is a usual situation, 
and the solution is to replace it by a,n estimator UttjT^n lloo A 

where is an estimator 

of the collection, chosen on a space Sm„ which is arbitrary, generally middle sized. Note 



that, by doing this, the penalty function becomes random. For details, we refer to Lacour 



(20071, Theorem 2.2. 



3.2. Upper bound for the rate. From Theorem[T| we can derive the rate of convergence 
of dm over anisotropic Besov spaces. We recall that anisotropy is almost mandatory in 
this context, see Remark [T] For that purpose, assume that a restricted to A belongs to 
the anisotropic Besov space B2^{A) on A with regularity f3 = (/3i,/32)- Let us recall 
the definition of B2^{A). Let {61,62} the canonical basis of and take • := {x G 
M^; x,x + hei, . . . ,x + rhci G A}, for i = 1,2. For x G A^^-, let 



AI,5(x) = ^(-ir'=(')5(x + fc/i6. 



be the rth difference operator with step h. For t > 0, the directional moduli of smoothness 
are given by 

1/2 



uJn,i{9,t) = sup( / \Al\g{x)\^dx) 
\h\<t^JA2^ ' ' 

We say that g is in the Besov space B^^i^A^) if sup^^Q ^^^^ ^"^'Wr'i.iC^, < 00 for 



integers larger than /3j. More details concerning Besov spaces can be found in Triebel 



( 2006 1 . The next corollary shows that dm adapts to the unknown anisotropic smoothness 



of a. 

Corollary 1. Assume that a restricted to A belongs to the anisotropic Besov space B2^{A) 
with regularity (3 = (/3i,/32) such that (3i > 1/2 and (32 > 1/2. We consider the piecewise 



polynomial or wavelet spaces described in Subsection 2.4 {with the regularity r of the poly- 
nomials and the wavelets larger than Pi — 1). Then, under the assumptions of Theorem^ 
we have 

E||q: - dmlll = 0{n 2^+2). 
where (3 is the harmonic mean of (3i and (32 {i.e. 2/(3 = 1/ (3i + 1/ (^2)- 

The rate of convergence achieved by dm in Corollary [T] is optimal in the minimax sense 
as proved in Theorem |2] below. For trigonometric spaces, the result also holds, but for 
/3i > 3/2 and (32 > 3/2 (because of (A^l)). 

Moreover, assuming for example that (^2 > one can see in the proof of Corollary [l] 
that the estimator chooses a space of dimension D^hi = D^^^^ < . This shows that 
the estimator is adaptive with respect to the approximation space for each directional 
regularity. 
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3.3. Lower bound. In the next Theorem, we prove that the rate n"^'^/^^^"'"^) is optimal 
over B2^{A) where we recah that 2/(3 = l//5i + 1/^2- Since the lower bound stated in 
Theorem is uniform over S^^(A), we need to introduce the ball 

B^^^{A,L) = {ae bI^{A) : ||a||^^ < L}, 

1.00 ^ ' 

where 

2 

(9) \\a\\Q0 (A) •= ll«IU + Ibis's M) = I|q|U + supy^t~^'a;r,,i(g,t). 

2,00 2,00 .^-^ 

Let us denote by the integration w.r.t. the joint law , when the intensity is a, of 
the n-sample (Xj, A^*(z), 2: < 1, i = 1, . . . , n). 

Theorem 2. There is a positive constant Cl such that 

inf sup Ea||a - a\\\ > 61^'^'^'^'^'^+^'^ 

/or n kr^e enough, where the infimum is taken among all estimators and where Cl is a 
constant that depends on (3, L and A only. 

3.4. Estimation of / and /q. We recall that / is the density of n, which is defined in 
Equation We define 

(10) fm = argmint;„(/i) where Vn{h) = \\hf - - V / h{Xi, z)Y\z)dz. 

h&Sm '^i^i-'^ 

This estimator admits a simple explicit formulation: 

(11) L= Yl hkV^fixmiy), with 6,,, = lj2'PT(^^) I ^k{^)Y\z)dz. 

{j,fe)eJmXA'm j=i 

As before, we consider estimation of / over the compact set A = [0, 1] x [0, 1]. We choose 
the space as the space with maximal dimension, as explained below. Let us denote it 
by Tim by Vn = dim('H„) its dimension (see (A^l)) and by in its index so that H^^ = Tin- 
Hence, we consider, instead of a general fm, the estimator 

/mi := argmin u„(/i). 

We are now in a position to define an estimator of /o by considering any inf 2)eA /mi {x-, z) 
with a given mi. Indeed, an arbitrary choice is sufficient for our estimation problem 
concerning /q. In our setting, only a rough estimation of the lower bound on / is useful. 
Therefore, for the purpose of estimating a, we can define 

(12) /o:= inf fml{x,z)Wiihm\ = {DmiMn^). 
Then, the following result holds: 
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Proposition 1. Consider fo defined by (12) in the basis [T], with logn < Dm* ^ 
n^/^/Vl^ and V^n^ = n^/^/Vlog^. Assume that f G B!f^^^\A) with (3 > 1, then 



fo — fo\ > /o/2) < C'f^/n , for any integer k, where Ck is a constant and therefore fo 
fulfills assumption (A5). 

The proof of this result is given in Section [7j 

Hereafter, we develop a remark concerning the estimation of / in order to explain 
why we have selected the second dimension the largest as possible. Let fmi be 

the orthogonal projection of the restriction of / to vl on the space x Tin, i-e. for 

rUn = {mi, in), fmi = Eq^s J^^ xyc„ bj,k^T"'^k'"^ ^ith I JmJ = Dmi and |/C„| = v'kK We 
obtain the following bias-variance decomposition. 

Proposition 2. Under (Ml), {M2), (Al) and {A4), we have 

(13) mL. - /Hi) < 11/.^. - /Hi + ^(^^^1^, 

where £{A2) is the Lebesgue measure of A2. 
Proof. We clearly have 

(14) \\L, - /Hi = Wfm, - /Hi + WL, - /^Jli, 

where the first term is the bias term and \\fmi — /mi Hi = J2{j k)eJm xiCn^^iM ~ ^j,k)'^ 
the variance term. In view of we have E(5j^fc) = 6^^^, and, as a consequence: 

E(H/mi -/mJli) = Var(6,-fc) 

E ^Var(y.7"(Xi) / ilj]^"{z)Y\z)dz) 

2 



^ E iHivj^^^^Af C"(^)^H-)d^' 



Now, we note that for any ^2-square integrable function ^, 



E \ I C"(^)e(^)^^l'< / i\^)<^^ 



by a simple projection argument (the left-hand-side term is the squared norm of the 
projection of ^ on Jin), and thus under assumption (^4), 

E(H/m.-/mJli) < ^ 5: E([,,-m]^) < ^(^M^. 



Gathering the terms, the risk of the estimator is bounded as in (13). □ 

Let us discuss the asymptotic rate of estimation of /a, the restriction of / to A, using 
the above procedure. For that purpose, assume that /a belongs to B^^i^A) with regularity 
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$ = (/9i,/32)- Now, consider the collection of trigonometric polynomials for ipjjipk, and 
apply lemma of Lacour (20071 (see Section [5] below) . The bias term is bounded by 

\\frn,-f\\\<C{D;^f^ + [V(^Y'^-}. 



It is worth noticing that the variance term (i.e. the last term of (13)) does not depend 

(2) 

on in nor on Dn . This explains why the size of the projection space in the z-direction 
must be chosen the largest as possible, when the mean square risk is under study. Take 
Y^n/logn and assume that /32 > 1, then (13) becomes 



p(2) 



E(l|/™,-/l|i)<C[«-f- + 5M^| + C'log„ 



n 



n 



Therefore, choosing Z)„ 



= n}l^'^f^'^^'^^ gives the rate 
E(||/^,-/Af)<C"n-2^^/(2/^^+i) 



which is the standard asymptotic rate for a single variable function with regularity /3i. 
We could study a model selection procedure and find a penalty function of order Dmx/T^-, 
so that a relevant space is chosen in an automatic way. We do not go into further details 
since a rough estimation of /o is sufficient to estimate the conditional intensity a. 



4. Illustration 




Figure 1. Case (NL) Estimated (top left) and true (top right) conditional 
hazard rates and example of sections (bottom) for a fixed value of x (left) 
or y (right). 
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In this section, we give a numerical illustration of the adaptive estimator am, defined 
in Section [2| computed with the dyadic histogram basis [H]. We sample i.i.d. data 
{Xi, Ti), . . . , {Xn, Tn) in three particular cases of the regression model of Example [T] from 
Section [T] For the sake of simplicity, we simulate the covariates Xi with the uniform 
distribution on [0, 1]. The size of the data set is n = 1000. 

• Case (NL). Non-Linear regression: 

Ti = h{Xi)+aei. 

We simulate Si with a x^(4) distribution and h[x) = 2x + b. Note that in this case, 
the hazard function to be estimated is 

1 (t-b{x)- 



. . 1 /t-b{x)\ 

aNL(a;, t) = -as , 

a \ (T / 



where «£ denotes the hazard function of e. 

• Case (AFT). Accelerated Failure Time model: 

log(rj) = a + bXi + Ei, 

where the £i are standard normal and a = 5 and 6 = 2. The hazard function to be 
estimated is then: 

/ ae(log(t) - (o + hx)) 
aAFT[x,t) = . 

• Case (PH). Proportional Hazards model: in this case, the hazard writes 

a{x,t) = exp(6a;)ao(^)• 
We take b = 0.4 and ao(i) = aAt"~^, which is a Weibull hazard function with a = 3 
and A = 1. 

The penalty is taken as 

2™i+™2 

pen(mi,m2) = 5||a||oo,A , 

n 

where ||a||oo,yi is estimated as the maximal of the estimated histogram coefficients (maxj^^ '^j,k) 
on the largest space which is considered (taken with dimension \/n). 

We can see from Figures T][3 that the algorithm exploits the opportunity (Figures [T] 



and|3]) of choosing different dimensions in the two directions, and that it captures well the 
general form of the surfaces. 

5. Proofs of the main results 

5.1. Proof of Theorem [l], We define, for /ii,/i2 in n L°°(A), the empirical scalar 
product 

n 1 

(15) {hi, h2)n = -y\ hi{X„ z)h2{Xi, z)Y'{z)dzl{Xi G [0, 1]) 

and the associated empirical norm = {hi, hi)n which is such that 



IE(||/iiL) = / l^hi{x,y)dfj,{x,y) = J J ^hi{x,y)f{x,y)dxdy = \\hi\\^ 




Figure 2. Case (AFT) Estimated (top left) and true (top right) condi- 
tional hazard rates and example of sections (bottom) for a fixed value of x 
(left) or y (right). 




Figure 3. Case (PH) Estimated (top left) and true (top right) conditional 
hazard rates and example of sections (bottom) for a fixed value of x (left) 
or y (right). 
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where we recall that / denotes the density of fi w.r.t. the Lebesgue measure on A. We 
shall use the following sets: 



= {minSp(Gm) > max(/o/3,n ^/^)}, f := Q f^, 

(16) A {V/. . 5„ : I « - l| < 1}, and !! := {|| - l| < 1}. 

For m G Mn, we denote by am the orthogonal projection on Sm of a restricted to A. The 
following bounds hold: 

E{\\am-a\\\) < 2||a-a„||i + 2E(||a^-a^||il(Anl7)) 

+ 2E{\\am - amWlHA^ n Q)) + 2E{\\a^ - a„||^l(17^)) 

< 2\\a- am\\A + 2E{\\am - a„||il(A n n)) 

(17) + 4E((||dAf + ||a||A)l(A^nO))+4E((||aAf + ||a||i)l(l^^)). 

We use the following results, whose proofs can be found in Sections 6.2 and[7j 

Proposition 3. We have E(||am||^) < C'n^, where C is a constant. 

Proposition 4. If {A4i) is fulfilled, we have P(A'') < Ck/n^ for any k > 1, when n is 
large enough, where Ck is a constant. 

Moreover, (.45) ensures that P(i7'^) < C^/n^ for any integer k. Thus, using Proposi- 
tions [3] and [4] and Assumption (.45), we get 

E((||aAf + ||a||i)l(A^ n ^)) + E((||a^f + \\a\\\)l{^^)) 
< \\a\\\{¥{^^) + P(A^)) + Ei/2(||^^||4)(pi/2(5^C) ^ pi/2(^C)) 

(18) < Ca/n. 

Thus it remains to study E(||a^ — am||^l(A n il)). We state the following Lemma: 
Lemma 1. The following embedding holds: 

AnS7 c f nl7. 

As a consequence, for all m G the matrices Gm are invertible on A DO,. 
Let us now define the centered empirical process 

Mh) = \Y.{j KXi, z)dN\z) - j h{X„z)a{X„z)Y\z)dz^ 

i=l 
n 

(19) hiXi,z)dM\z), 

i=l •' 

where we use the Doob-Meyer decomposition. For any /ii, /i2 S {L"^ H L°°)(^), we have 



ln{hi) - 7„(/l2) = \\hi- h^Wl + 2{hi - h2, h2)n " ^ /(/^l " h2){X,, z)dN\z) 

= \\hi- h2\\l + 2{hi - /i2, h2 - a)n - 2i^„(/ii - /12). 
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Now, as on A n we have 

7n(am) + pen(m) < 7„(am) + pen(m). 
It follows, from the inequality 2xy < + with x,y,9 ^ M"*", that, on A n fi, 

||am-am||n < 2(dm, - Om, « - am)n + pen(m) + 2M„(dm - Om) - pen(m) 
< ^||dm - Omll^ + 4||a - Omlln + Pen(m) 



+ ^||a»n - Omll^ + 4 sup - pen(m). 



1 

||d»n - amll^ + 4 
where 5;;,^,(0, 1) := {heSm + Sm' : < !}• This yields 



3..^ ii2 ,111 ii2 / \ ii2 

^ II '-^m lln ^ "^11 ^ '-^tti || n ~^ pGn(?72 j + — || Q^tti ||^ 

+4 



sup i^nl^) ~ p{fn, fn) j + 4p{m, m) — pen(m). 
. (0,1) ^ 



■ /leB'' . (0,1) 
Now, let us choose the penalty such that 

(20) ym,m', 4p{m,m) < pen(m) +pen(m'), 

and use the definition of A. We obtain on A n il: 
1 

2' 



\arh-ajn\\f, < 4||a - am||„ + 2pen(m) 



1 _ ||2 



+ 4 V ( sup ul{h) - p{m,m')) 



and thus on A n 

^||dm-arrt||^ < 4|| a - || ^ + 2pen(m) 

+4 ( sup u'^{h) — p{m,m')] . 

Using the following proposition, we can achieve the proof of Theorem [T} 
Proposition 5. Let 

( l\ l-\ , II II \ 

p[m,m ) = K[l + ||a||oo,A)- 



n 



where Co is a numerical constant. Under the assumptions of Theorem^ we have 



V e( sup (z.2(/i)-p(m,m'))+l(A)) < 

,pA4 ^/iGB" ,(0,1) ^ 

C^-'vlri m..m.' ^ ' 



n 



This proposition entails: 

(21) 7E(||dA - am||^l(A n n)) < 4||a - a„||2 + 2pen(m) + ^, 

4 ^ n 
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Gathering ([17|), (|18|) and (|2l| leads to 
E(||a, 



alii) 



(22) 



< 2 



< 2 1 + 



a 



+ 4||a - amlL + 2pen(m) H + 

Jo ^ ^ ^ 



1611/ 



X|U,oo 



/o 



16 



ail 



n 
C3 



n 



a|U + -;^pen(m) + 

Jo n 



for any m G A^n- This concludes the proof of Theorem [T| 



□ 



5.2. Proof of Corollary [T| To control the bias term, we state the following lemma proved 
Lacour| ( |2007[ ) and following from |Hochmuth] ( |2002[ ) and |Nikorskii| ( |1975[ ) : 



m 



Lemma. 



Lacour (2001) Let s belong to B2^{A) where (3 = (/3i,/32)- M^e consider that 
S'^ is one of the following spaces on A of dimension DrniDm2 ■ 

• a space of piecewise polynomials of degrees bounded by Si > f3i — 1 (i = 1,2) based 
on a partition with rectangles of sidelengthes and 1/Dm2, 



are 



a linear span of {(pxipf^, X E U™^A(j),/x G U™^M(A:)} where {(f)\} and {"0^} 
orthonormal wavelet bases of respective regularities si > (3i — 1 and S2 > /?2 — 1 



(here Dr. 



1,2;, 



• the space of trigonometric polynomials with degree smaller than D„i^ in the first 
direction and smaller than in the second direction. 

Let Sm be the orthogonal projection of s on S!^. Then, there exists a positive constant Cq 
such that 



Sm\\A 



1/2 



If we choose for Sm as one of the S'^s, we can apply the above lemma to the function 
a A, the restriction of a to A. As Om has been defined as the orthogonal projection of a a 
on Sm, we get: 

\\a - am\\A < Co[D;^1' + D-"^^]. 
Now, according to Theorem [T] we obtain: 



alii 



<C" inf {D-f^+D-f^ + ^^^^^}. 



In particular, if m* = {m\, rn^) is such that 

J2 



Dm' = +'32 +2/31/32 J aud D„ 



[{Dm*y\ 



then 



Moir 



< C"'{D, 



r,i+/3i//32 



+ 



2/3i/32 
0{n /3l+/32 +2/31/32 



n 



2/3 

0(n 2/3+2)^ 



where the harmonic mean of /?i and /32 is /3 = 2/3i/32/(/3i + [^2)- The condition Dmi < 
n^/Vlogn allows this choice of m only if /32/(,5i+/32+2/3i,52) < 1/2 i.e. if /3i -/32+2/3i/32 > 
0. In the same manner, the condition (32~ Pi'^-'^Pih > must be verified. Both conditions 
hold if /3i > 1/2 and (32 > 1/2. 
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5.3. Proof of Theorem [2j In order prove Theorem |2] we use the following theorem 



from Tsybakov (2003), which is a standard tool for the proof of such a lower bound. We 
say that 5 is a semi-distance on some set Q if it is symmetric and if it satisfies the triangle 
inequality and 8(6, 6) = for any 9 G Q. We consider K{P,Q) := J log{dP/dQ)dP the 
Kullback-Leibler divergence between probability measures P and Q such that P <^ Q. 



Theorem ( |Tsybakov| ( [20031 )). -^ei (6,5) be a set endowed with a semi- distance d. We 



suppose that {Pq : G 0} is a family of probability measures on a measurable space {X,A) 
and that v > 0. If there exist {6q, . . . , 6m} C Q, with M >2, such that 

(1) diOj, ek)>2v \/0<j<k<M 

(2) Pe^ « Pe, V 1 < i < M, 

(3) jj Ejii K{Pe,,Peo) < alog(M) for some a G (0, 1/8), 
then 

mfsupEe[{v-^d{e,9)f] > 1 - 2a - 2,1 -^—^] , 

§ eee ^'^^^"l + ^MV \jlog{M)J' 

where the infimum is taken among all estimators. 

We construct a family of functions {ao,...,aM} that satisfies points (l)-(3). Let 
ao(x,t) = S B) where i3 is a compact set such that A = Ai x A2 C B x B 

and \B\ > 2\A\-I'^ jL. As a consequence, we have a{)(x^t) > for {x^t) G A and 
||ao|| o/3 = ||aolU + |o^o| 0/3 < -/^/2 since |ao| n/s ij,-. = 0, see (|9|. We shall denote 

2,00^-' 2,00V/ 2,00^/ 

for short oq = in the following. Let ■0 be a very regular wavelet with compact support 

(the Daubechies's wavelet for instance), and for j = (ii,j2) G 1? and k = {ki,k2) £ Z^, 
let us consider 

i^j,k[x,t) = 2(^'i+^2)/20(2-''if - ki)'4j{2^^x - k2). 

Let Sj^k stands for the support of V'j.fc- We consider the maximal set Kj C such that 

(23) Sj^k cA,yk£ Rj and Sj^k n Sj^k' = 0, VA;, k' eRj,k^ k'. 

The cardinality of Rj satisfies \Rj\ = c2^'^^^'^, where c is a positive constant that depends 
on A and on the support of ip only. Consider the set flj = {0, and define for any 

U> = {u!k) £ 



where 6 > is some constant to be chosen below. In view of (|23|) we have 

l|a(-;w) - a(-;w')iu 



1 2 bp{uj,uj' 

where 



n 



p{uj,uj') := l{uJk / uj'k) 



is the Hamming distance on $7,-. Using a result of Varshamov-Gilbert - see Tsybakov 



( |2003[ ) - we can find a subset . . .,uj^^'^^'^} of such that 

= (0,...,0), p{JP\J'^'^) > \Rj\/8 
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for any < p < q < Mj, where Mj > 2l^^l/®. We consider the family Aj = {ao, . . . ,aMj} 
where Qp = a{-,uj^'P^). This family satisfies for any < p < q < Mj 

\\ap-aq\\A> [-^) =2vj 

for Vj := y^b\Rj\/{32n). This proves point (1). Now, let us gather here some properties 
for this family of functions. We have 

\\a{-;uj) - aolloo.A < y II^IL < ao/3 

and consequently a{x, t; uj) > 2ao/3 > for any (x, t) £ A and to £ 0,j whenever 

(24) c^y"<j^. 



n J 3 



00 



Using [Hochmuth ( 2002 1 , we have for tp smooth enough that 
Hence, if 



(2^i^i + 2^2/32 )(2ii+i2) 1/2 ^ ^ 



we have ||a(-; a;)|| „/3 < L, so a{-;uj) G B2^{A,L) for any u e ilj. This proves that 

2,0c V y ' 



Points (2) and (3) are derived using Jacod's formula (see Andersen et al. (1993)). Indeed, 
we can prove that the log-likelihood £{a, ao) := log{dPa/dPao) of A'' writes 

e{a,ao)= [ {loga{X,t) -logao{X,t))dN{t) - [ {a{X,t) - ao{X,t))Y{t)dt. 



For any a G Aj, we have ||a — ao||oo,yl < «o/3 < a{x,t)/2 for any {x,t) G A. The 
Doob-Meyer decomposition allows to write that, under Pa^: 

e{a, ao) = ^ (^i/„(x,t)(a(^, t) - ao{X, t)) - {a{X, t) - ao{X, t))^Y{t)dt 

+ [ {loga{X,t) -logao{X,t))dM{t) 
Jo 

where ^a{x) := — log(l — ax) /a for a > and x < 1/a. But since ^a{x) < x + ax'^ for 
any x < l/(2a), we obtain 

£(a,ao)<— / {a{t,X) -ao{t,X))^Y{t)dt+ [ (log ao{t, X) -log a{t,X))dM{t) 
2ao Jq Jq 

which gives by integration with respect to Pa 

3||a-ao||^ 3||/x||oo||a-ao|lA ^ 36||/x||oo|i?il 



K{Pa,Pao) < ^ ^ < " '' ^ < 



2ao 2ao 2nao 
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for any a G Aj. Since the counting processes {N^,...,N^) are independent, we have 
K{P^,P2,) = nK{P^,P^,) and 



p=0 



2ao 



with o = 126||/x||oo/(aolog2) G (0, 1/8) for h small enough. It only remains to choose the 
levels ji and j2 so that (24) and (25) holds, and to compute the corresponding Vj. We 
take j = (ii,j2) such that 

ci/2 < 2^'in-^2/(^i+'32+2/3i/32) < and C2/2 < 2^2n-'3i/(A+ft+2/3i/32) < 

where ci and C2 are positive constants satisfying (c^^ +C2^)\/ciC2 < L/(2\/6c)^''^- For this 
choice, < ciC2n~'^^ ^ so (24) holds for n large enough and (25) holds and 

Vj > C3n-'^/(2'3+2) where C3 = i/6cciC2/128. □ 

6. Deviation and maximal inequalities for the empirical process 

Usually, in model selection (see for instance Massart ( 2007[ )), the penalty is explained us- 
ing the so-called Talagrand's deviation inequality for the maximum of empirical processes. 
Because the empirical process (see Equation (19)) considered here has a particular 
structure, we cannot use directly Talagrand's inequality. In this Section, we prove Ben- 
nett and Bernstein inequalities for z/„(-), and derive a maximal bound using the so-called 
chaining technique which explains the penalty ([T]). 

6.1. Deviation inequality. 

Lemma 2. For any positive 5, e and for any function h G (L^ n L°°){A), we have the 
following Bennett-type deviation inequality: 



{un{h) > e, \\h\\n <S)<exp(- '^^,,]^^}°°''^ 9 



oo,A 



a 00, A 



52 



where g{x) = (1 -|- a;)log(l + x) — x for any x > 0. As a consequence, we obtain the 
following Bernstein-type inequalities: 

neV2 

||a|U,oo<5^ + ^e||^IU,c 



(26) ^[yn{h)>e,\\h\\n<5)<e^Y> 
and 

(27) ^{un{h) > 6^J\\a\\^,AX + ||/i||oo,W3, \\h\\l < < exp(-nx). 
Proof. Remark that Vn{h) = i^{h, 1) where i^{h, •) is the stochastic process given by 

n „i n 

nu{h, t) := V / h{Xi, z)dM\z) := nV] u{h, t)'. 
^=l -^0 i=l 

The predictable variation of M* is given by {M^{t)) = a{Xi, z)Y^{z)dz, so we have 



{ni^{h,ty) 



h{X„zfa{Xi,z)Y\z)dz 
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for any t £ [0,1]. Moreover, we have AM*(i) G {0,1} for any i = l,...,n since the 
counting processes iV* admit intensities. We can write i^{h,ty = u{h,ty''' + u{h,ty''^ where 
iy{h,ty''^ is a continuous martingale and where v{h,ty''^ is a purely discrete martingale 
(see e.g. Liptser and Shiryayev (19891). For some a > (to be chosen later on) we define 
Ul^{t) := ani'^{h,t) — ^^(t), where 5*(t) is the compensator of 



(28) 



-{anu{h,ty''') + ^ (^exp{a\Anu{h,sy\) - 1 - a\Anu{h, s)' 

s<t 



We know from the proof of Lemma 2.2 and Corollary 2.3 of van de Geer (1995), that 
exp(C/*(t)) is a supermartingale. Using the standard Cramer-Chernoff method (see for 
instance 



Massart (20071, Chapter 2), we have, for any a > 0: 



F{iyn{h)>e,\\h\\n<S 
= p[' exp(anf„(/i)) > exp(nae), \\h\\n < S 

n n 

"exp (an J^K/i, 1)^-^5^(1 



< E 



1/2 



i=l 



i=l 



n 

< (E[exp ( 5^5^(1) - ane)l{\\h\\n < 6} 



i=l 



E 

1/2 



exp(Y,Sl{l)-ane)l{\\h\\n<S} 



1/2 



The last inequality holds since exp(t/*(t)) = exp{ani>^{h,t) — 5*(t)) are independent su- 
permartingales with Ul{0) = 0, so that E[exp(C/*(i))] < 1, for i = 1, . . . , n. 

Let us decompose M* = M^''^ + M*''^, with M*''^ a continuous martingale and M*'*^ 
a purely discrete martingale. The process ¥2(1) := {M^{t)) is the compensator of the 
quadratic variation process [M'{t)] = {M'^'^ity + Y.s-ct |AM*(t)|2. If A; > 3, we define Vi{t) 
as the compensator of the A;- variation process ^^<j |AM*(t)|'^ of M'^{t). Since AM*(t) G 
{0, 1} for ah < i < 1, the V^. are all equal for fc > 3 and such that Vi{t) < V^{t), for ah 
A; > 3. The process <S'a(l) has been defined as the compensator of (28). As a consequence, 
we have: 



^^W = El[/ \h{Xi,zy>'dV^{z)< I h{X,,z)'dVi{z)xY^ 

k>2 ' •'^ •'^ k>2 



1 WhW''-'^ 



kl 



and if \\h\\„ < 5 



n5'^\\a\\oo,A 



2 

oo,A 



exp (a||/i||oo,A) - 1 - a||/i||oo,A 



The minimum of — ane for a > is achieved by 

1 



log I ^ ^ + 1 



and is equal to 



n(^^||a| 



00, A 



oo,A 



« oo,A 
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where we recall that g{x) = (1 + x) log(l + x) — x. This concludes the proof of the 
Bennett inequality. Inequality (26) follows from the fact that g{x) > 3x^/(2(x + 3)) for 
any x > 0. To prove (27), we use the following trick from Birge and Massart ( 1998| ): we 
have g{x) > g2{x) for any x > where g2{x) := x + 1 — v'r+^x'and~5^^^'(y)"^V^+y. □ 



6.2. Proof of Proposition [s] (maximal inequality via — L°° chaining). Using 
a. — L°° chaining method, as in Barron et al. (1999) or Comte (2001), we obtain the 
following result, which leads to Proposition ([5j): 

Lemmas. Let Bi^^,{Q,l) = {t ^ Sm + Sm'A\t\\f, <l}. Then 



E 



sup 



,(0,1) 



(ul[h) - p{m, m'))+l(A) ) < C(l + ||a||oo,A) 



n 



where 



p{m,m) = k(1 + ||a||oo,A) 



n 



Proof. The result of Lemma [s] is obtained from Inequality (26) by a L'^{^) — L'^ chaining 
technique. The method is analogous to the one given in Proposition 4 p. 282-287 in 
Comte (2001), in Theorem 5 in Birge and Massart (1998) and in Proposition 7, Theorem 8 
and Theorem 9 in Barron et al. (1999). Since the context is different, we give, for the sake 
of completeness, the details of the proof. It relies on the following lemma (Lemma 9 in 
Barron et al. (1999)): 

Lemma (Barron et al. (1999])). Let fi he a positive measure on [0,1]. Let {ipx)x^\ be a 
finite orthonormal system in L^ n L°°(/i) with |A| = D and S be the linear span of {V'a}- 
Let 



(29) 



sup ■ 

D (3^0 



Easa/^aV'aI 



For any positive 5, one can find a countable set T C S and a mapping p from S to T with 
the following properties: 

• for any ball B with radius a > h5, 

\TnB\<{B'a/6)^ with B' < 5, 

• \\u — p{u)\\fj, < 6 for all u in S , and 



sup \\u ■ 
■uep-i(*) 



t\\oo < ^6, for all t in T. 



To use this lemma, the main difficulty is often to evaluate f in the different contexts. 
We consider a collection of product models {Sm)m£Mn which can be [DP] or [T]. For the 
sake of place, we omit collection [W] as it right similar to collection [DP]. Recall that 
-B^„j/(0,1) = {t e Sm + Sra'i \\t\\^i < 1}. We have to compute f = fm,m' corresponding 
to S = Sm + Sm' C Sn on which the norm connection holds. We denote by D{m,m') = 
dim{Sm + Sm')- 
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Collection [DP] - As Sm + Sm' is a linear space, an orthonormal L2(^)-basis 
(V'A)AeA„ can be built by orthonormalisation on each sub-rectangle of (</?A)AeA„) 
the orthonormal basis of 5„ . Then 



sup 



< 



AeA„ 



IV'aIIIoo.a < (r- + 1) sup 

A6A„ 



'JX\\oo,A 



< (r + l)3/2yAr sup 

A6A„ 

< (r + 1)3/2 yiV^ sup 

AeA„ 

< (r + 1)3/2 v/iVTb. 



^aIIm 



/Vfo 



Thus here fm,m' < {{r + lf/^/^Q),jNn/D{m,m'). 
Collection [T]- For trigonometric polynomials, we write 

II Easa,, /?A^/'a||oo,A 



sup 



\li\oo - \/7o|/3|oo 



< 



To 



Therefore, v^yi rni 



We may now prove Lemma |3j We apply the Lemma from Barron et al. (1999) to 
the linear space Sm + Sm' of dimension D{m, m') and norm connection measured by 
^■m,m' bounded above. We consider (5fc-nets = T^^ n -8^^,(0,1), with 5^ = 5q2~^ and 
^0 < 1/5 (to be chosen later). Moreover we set = logdr^l) < D(m, m') log(5/5fc) = 
D{m,m')[k\og{2) + log(5/5o)]- Given some point h E 5^^,(0,1), we can find a sequence 
{hk]k>o with hk G Tk such that \\h - /ifc||2 < ^2 and \\h - /ifc||oo,A < rm^m'h- Thus we 
have the following decomposition that holds for any h G B!^^,{0, 1): 



h 



ho + ^(/ife - hk-i), 



k>l 



with ||/io||^ ^ 1 
\\hk — hk- 



\ho\\oo,A < r( 



\, and 



{in,m') ) 

k-nk-i\\^<2{6l + 6l_,) = 56l_,/2, 



\hk 



^fc-l||oo,A < '^f{m,m')^k-l/'^ 



for any A; > 1. In the sequel we denote by 



the measure P(- n A), see (16). Let in 



addition {r]k)k>o be a sequence of positive numbers that will be chosen later on and rj such 
that rio + ^k>iVk <V- We have: 



sup 



feGB^ ,(0,1) 



Z^n(/i) > r] 

+ 00 

3{hk)k(^]M G 7a,. / Z^„(/lo) + ^ Z^n(/lfc - /ifc-l) > Vo + '^ 



fceiN 



fe=i 



fc>i 



< 
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where 



k=l h^._ieTf._i 



Then using Inequahty (27), we straightforwardly infer that Pi < exp(i?o — tlxq) and 
P2 < Yl,k>i ^xp(^fc-i + — nxk) if we choose 



m,m' 



Vk = (l/2)(5fc_i(Y/l5||a||oo,A2;fc + r(^rn,m')^k) ■ 

Fix n > and choose xq such that 

nxo = Ho + Dm' + u 

and for k>l,Xk such that 

nxk = Hk-i + Hk + kDm' + -Dm' + u. 
If Dm' > 1, we infer that 



sup Un{h) > ??o + V] < 
he-Bf' ,(0,1) ,,>i ^ 



D„>-u 



l + ^e-'^^™') < 1.6e 



k=l 



Now, it remains to compute X^,fc>o%- We note that X^^q '^'i-' ~ YlV=o^^k = 2(5o. This 
imphes that: 



Xk 



k=l 



00 

< [log(5/5o) + <5o^2-('=-i)[(2fc - l)log(2) +21og(5/5o) + k\ 



D{m, m') 



k=l 



n 



A:>1 



fc>l 



(30) 



a(^o}D^rn^ 1 + 260, ^ ^ . 

< \ [Dm' + U), 



n n 

where a((5o) = log(5/(5o) + 5o(4 log(5/(5o) + 61og(2) + 4). This leads to 



fc=0 



2 1 
< - 
- 4 

1 



< 



< 



4 
15 

Tl 



V2(^y3||a||oo,Aa;o/2 + rm,m'XQ/3j + ^^h-iyy '^^W\\oo,AXk + 

k=l 

^ 00 00 

m,m' -iXk 



k=l k=l 

Xq ^y^Jk-iyXk) ||a||oo,yl + ^m,m'(^0 + X^^^l^^fc 
k=l k=0 

00 00 „ 



< 4 



2 (^xo + E 6k~iXkj ||a||oo,A + rm^m' [^0 + E Sk-lXk 

k=l k=l 
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Now, fix do < 1/5 (say, 60 = 1/10) and use the bound (30). The bound for (X^^o^'^)^ 
less than a quantity proportional to: 



D{m, m') ^ 



n 



n 



a 00, A + ?^ri 



2 (D{m,m') -Dm'\2 ||a||oo,A^i 



+ 



+ 



+ r 



n 



u 



n n 

For collection [DP], we use that r^^/ < {r + l)^Nn/(fQD{m,m')) and A^^^i < n/logn to 
obtain the bound: 



+ 



n 



n 



< c(r + 1)3 ^» D{m,m'y 
J foD^m, m') v? 



^ c{r + 1)'^ NnD{m^m') ^ c(r + 1)^ 1 D{rn,m') ^ D{m,m') 



fo 



fo log n n 



For collection [T], we have r^.m' !^ C^JNn and A'^n < y^/logn. We get 

^2 fD{m,m') , ^ CNnD{m,m'f ^ C D{m,m') ^ D{m,m') 



+ 



< 



n n J rfl logn n 

Thus, for both the cases, the bound for (^ r/^)^ is proportional to: 



< 



n 



n 



(1 + ||a||oo,A 
We obtain, as D{m, m!) < Dm + D 

Pa 



D(m, m!) ^ D„ 



n 



+ 



\a\\oo,AU 

n 



+ r. 



u 



sup [fn(/l)]^ > k( (1 + ||a||oo,A) 
AeB" ,(0,1) 



Dm + i^m' . ||Q||oo,An _2 



n 



< Pa 



sup K(/i)]^ > 1]'^ 
heB^" ,(0,1) 



< 2 



sup fn(^) > V 
heB" ,(0,1) 



< 3.2e 



SO that, if we take Kq, := k(1 + ||a||oo,yl)) 



E 



sup ^^{h) — p{m,m')] 1(A) 
heB'' ,(0,1) ^ + 



< 



( sup v'^{h) > p{m,m!) + u\du 

^h^B" ,(0,1) ^ 

m..m,' ^ ' 



< e 



+ 



< e 



n 



00 2r^ 







2kq< / t" / 

' m..m.' 



< e 



™ n -I 

n n 



n Jo 

^ 

— ) 
n 

where constant depending on ||Q^||oo,yl* This ends the proof of Lemma Isl 

To conclude the proof of Proposition |5j we just have to bound X]m'eA^„ e~^™' . This term 

is at most 



j,k>i j=i k=i j=i 



e-3 



- < 



(1-e 



-l\2- 
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□ 



7. Proof of the auxiliary results 

I I ^ ^ A 1 (2) 

7.1. Proof of Proposition 1, Let /m* and /o be defined by (12), with m* = {Dm^,Vn ) 

with logn < Drrii < n^/^/vTogn and dI^^ < v}/'^ / ^Jlogn, see (A^i). We remark that, for 
ah {x,z) G R^, 

fml{x,z) = f{x,z) + fml{x,z) - f{x,z) > fo - " /||oo,A- 

We deduce that - /||oo,yi > fo - fo- In the same manner, - /||oo,A > fo - fo- 
Thus 

¥{n^) = P(|/o - /ol > fo/2) < P(||/^. - f\\oo,A > fo/2)- 
Therefore, we just have to prove that P(||/m* — /||oo,A > fo/2) < Ck/n^- 

First remark that - /||oo,A < - /^t lloo,A + - /||oo,A- As / G 45o'^'^(^) 
with /3 > 1, the imbedding theorem proved in 



Nikol'skii 



( 1975 ) p. 236 impHes that / belongs 

to B'£^^^ '{A) wi t h PI = /3i(l - 1//3) and = /?2(1 - 1//3). Then the approximation 
lemma of Lacour (2007) recalled in Section 5.2 which is still valid for the trigonometric 
polynomial spaces with the infinite norm instead of the norm, yields to 

Wfm,* - /||oo,A < C{D-1l + 

As we assumed that D^* > logn, it follows that ||/mi* — /||oo,yl tends to zero when 
n +00. Thus, for n large enough, we have \\fmi* — /||oo,A < /o/4 and 

I* — /m* ||oo,yl > /o/4). 



I - /||oo,A > fo/2) < 

Now, following (7W2), we get 



Wfml /mj I 



oo,A 



< 



Now we define 

(31) Mh) = -Yl J [hiX„y)Y\y)-E{h{X„y)Y\y))yy=\\Vh\\l-\\Vh\\ 



1=1 



With this notation, and reminding of ( |11[ ) and of the proof of Proposition [2] in Section 3.4 
we have 

j,k j,k 



Thus 



mLi-f\\oo,A>fo/2) < r(Y^€{^T^€')> 



fi 



16(t>lMDmlV^^^)^ 



< Y.F(\Mv^f^^l.f)\> 



fo 



(2), 
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Notice that M^J*' = \ E?(^/'' " IE(f/f )), where = ^,{X,) j iju{y)Y\y)dy 

are i.i.d. r.v. We can apply the Bernstein inequahty to "dn i-e. to the i.i.d. r.v. Uf'^ . 
Indeed, we have 

\\ui%o. < WHuj \My)\dy<\Wj\\oo{ j ^l{y)dyf'^ <^fiJ^,:=c 

and E[{Ut''f] < ||/x||oo,A = v\ We get 

^(\M^T^^f)\ ^ m) ^ 2exp(-^J^) 

with X = fo / i^V 4'i4'2Dm*T^n ) and v and c are right above. That is: 

As both D^i and are less than n^^^/ ^/log(n), we obtain: 



F{n^) < 2D^.P(2) " ^° < 2V^exp - C"(log nf)< , , 

for any A; arbitrarily large, when n is large enough. 

Proof of Proposition |3| Note that is either or argmin^g^^ 1n{t)- Let us denote 
for short (pj := ip"^ and V'fc := V'™- In the second case, min Sp(Ga) > max(/o/3,n-V2) 
and thus 

1 " r 

< (minSp(G^^))-2||T^||2 < min(9//2, n) ^ (- (^.(X^) J i^k{z)dN\z 

j,k i=l 

< min(9//2,n)-^j;(^|(X,) / Mz)dN\, 

^ i=l j k 

< min(9//2,n)<^iP«- ( / i^k{z)dN\2 



Therefore, 



i=l \ k 

(32) < n'^l{vWfv(y-Y,Y.{ J Mz)dN^{ 



\ 2 
2\ 



i=l k 
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Now, we have: 



1=1 k 



i=l k 



i=l k 



E / ^k{z)a{X\z)Y\z)dz 



Using the Biirkholder Inequahty as recalled in Liptser and Shiryayev (1989) p 75, and the 
fact that the quadratic variation process of each M* is A^* (i = 1, . . . , n), we obtain: 



i=l k 
n 



i;kiz)a{X\ z)Y\z)dz 



i=l k '' i=l k 

^ ^'^^n^Y.^{{ E <{s)))+2^-Y,Y.^{{hk{z)a{X\z)Y\z)dz 

j=l k s:ANi{s)j^O i=l k 

< 2'alj2^[[ Y.'l't{s)))+2'lj2Y.^{{j Mz)a{X\z)Y\z)dz 

i=l s:ANi{s)^0 k i=l k 

< 2^C,M'Di?^flY.K{ E i))+2^1Y.EK{ MzM^\z)Y\z)d 



=1 s:AN'{s)j^O «=1 k 

< 2^aMl^l^''f-J2^{N'{l)) +2^-J2J2^{{ / MzMX\z)Y\z)d, 
1=1 1=1 k 

This yields, using Assumptions (^3) and (^4): 

^(^EE(/^'^(-)^^^(-))') 

i=l k '' 

< c(^Mi^'n^fm\^)) + T.^{{ [MzMx,z)Yiz)d 

k 

k k 

(33) < c(M'D^^^)MN\i)) + \Mt,M'Di?^f 



Then we have, by inserting i33h in (32), 



1 " r 



i=l k 



< Cn2(p(i))2(p(2))3 < < c'n^ 
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as we claim that we can reach < ^/n/log{n) in the case of locahzed bases [DP], [W], 
[H]. Note that for basis [T], under (A41), the final order is much less (namely n^'^^ instead 
of 



Proof of Proposition |4[ Define, for p > 1, the set 



Ap = {Wh e Sn, 



In/ ir "II /I 



< 1 - 1/p}, 



where 5„ is the set of maximal dimension of the collection. Remark that A = A2, see 
(16). First we observe that: 

P(a5)<p( sup |^?„(/i2)| > 

^heB^ (0,1) ^ 



where ??n(') is defined by (31 ) and B^^(0, 1) = {t £ Sn, \\t\\^ < 1}. We denote by {ipj (dipk) 
the L^-orthonormal basis of 5^. If h{x, y) = ^ aj^kV'j{^)'^k{y)i then 



(34) 



We obtain 



j,k,j',k' 



(35) sup \^n{h'^)\ < fo ^ sup ^ aj^kaj',k''&n{{<fj'^'(pk){'Pj''^i^k')) 



Lemma (Baraud et al. (2001a|). Let Bjji = \\^j^j'\\oci,A and Vjj' = \\ipjipjr\\2. Let, for 
any symmetric matrix (^jj') 

p{A) := sup Y\bjbf\Ajj, 

and L{ip) := max{p'^ (V) , p{B)} . Then, if {A42) is satisfied, we have L{ip) < (pi(Vn^)'^, 
and L{ip) < 5cl)f'Dn \ if the basis is localized (cases [P] or [W]). 

Let us define 

/o'(l - I/P? 



X := 



and 



e 



4||/x||oo,A(^i'^)2L((^) 

V(i, A;)V(/, k') IMi^j ® i^k){v>j' ® i^k'))\ < ^[Bjj,x + Vjj,^2\\fx\\oc,Ax)]. 



Starting from (35), we have, on G: 



sup |i?n(/i^)| < ^ sup y^(y^ \aj^kaj',k'\)( Bjj'X + Vj,j'\/2 



E^,fc^l k,k' 



tx\\oo,AX 
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^2 / 



Thus setting bj = \(ij,k\, we have bj < Vn and it follows that, on Q, 



sup \Mh^)\ < fo^^l?^ sup y^\bjby\(Bjj,x + Vj^j>J2\\fx\\oo,AX 



< (1-1/P)Q + -^)<(1-1/P) 



71. 



Therefore, 



'( sup |l?n(t')| > 1- -) <n0'^). 



Let (j)x = (Pj -^fc for A = (j, A;). To bound F{'&rii<P\(px') > Bjj/x + Vj f ^y2\\fx\\cx>,Ax), 
we will apply the Bernstein inequality given in Birge and Massart (19981 to the i.i.d. r.v. 

(36) C/^^' = U^^'''^'^'''"'^ = ^,{Xi)^j,{X,) J My)i^k'{y)Y\y)dy. 

Under (^4), the r.v. are bounded 

\U^'^ I < \\ipjipj>\\oo,A / \ipk{y)i^k'{y)\dy < \\(Pj(Pj'\\oo,A = Bjj,. 



Moreover, using (.44) again, we obtain: 

(f/"'"? < {ip,{Xi)ip,,iXi)f J i;l{y)dy J ^Uy)dy = {ip,{Xi)ip,,[X,)f 

and thus 

E[(C/^'"V] < n{v,{X,)^y{X,)?] < \\fx\\oo,AVlf 

We get 



Given that P(a5) < P(eC) = T.x,x'^(\MMx')\ > Bj,j'X + Vj,,,y^nM oo,AX^ , we can 
write: 

n/2(l - l/pf ^ 



- ^ 4||/^|U,^(Pi^))U(^)J 
< 2n^exp<^ ,11, II jTy, \. 

Following the lemma of Baraud et al. ( 2001a| ) above, and using Assumption (A4i), we 
have 

{V'i)fL{^) < <t>i{V'i)V^^)f < <Ain/log2(n). 
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And then, we have for any k arbitrarily large, when n is large enough, 

(37) F(Aj) < 2n^exp{ - " '^f log^(n)} < ^. 

Now, if the basis is localized, the result is better. In this case, L{ip) < 5(l)f'Dn \ Moreover, 
take histogram basis in (34), then all terms with k ^ k' vanish and then we can take 
bj = (X^fc a^fc)"*^^^ directly. Then, as then < 1, we obtain 

F(A,) < 2(P( )) Vi ) exp I - < 2n exp | - ^^^^^r^^j- 

Thus -/^(v') < 5(/)f2?i^^ < i?!)in/ log^(n) is enough to get (37) again. The proof is easy to 
extend to any localized basis as [P] or [1^] , (with Vn^ in the bound of 6| replaced by 
r + 1 in case [P] for instance). 

Proof of Lemma [l| Let m € TWn be fixed and let i be an eigenvalue of Gm- There exists 
Am 7^ with coefficients {a\)x such that GmAm = (-Am and thus A^GmAm = (-A^Am- 
Now, take h := X^a'^a^'a G We have = A^GmAm and = A'^Am- Thus, 

on A (see (ITeJ)): 



A;^G^^^ = \\h\\l > ^\\h\\l > IfoMl = \hAlAm. 



Therefore, on A, for all m G A4„, we have minSp(Gm) > /o/2. Moreover, on we have 
/o > 2/o/3 and max(/o/3, n~^l^) = /o, for n > 36//2. □ 
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